Visualitions - Sales Impact Analysis¶
This phase focuses on visualizing temporal patterns in sales performance, specifically analyzing daily and monthly trends across stores. The goal is to uncover seasonality, peak periods, and operational rhythms that shape store outcomes. Unlike other analytical stages, this phase does not involve correlation studies—instead, it concentrates on trend-based insights that highlight how sales evolve over time, laying the groundwork for more targeted forecasting and strategic planning.
1. Setup & Imports Libraries¶
In [1]:
import time
In [2]:
# Step 1: Setup & Imports Libraries
print("Step 1: Setup and Import Libraries started...")
time.sleep(1) # Simulate processing time
Step 1: Setup and Import Libraries started...
In [3]:
# Data Manipulation & Processing
import math
import numpy as np
import pandas as pd
from pathlib import Path
import scipy.stats as stats
from datetime import datetime
from sklearn.preprocessing import *
# Data Visualization
import seaborn as sbn
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import plotly.io as pio
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pandas.plotting import scatter_matrix
# to ensure Plotly works in both Jupyter and HTML export
pio.renderers.default = "notebook+plotly_mimetype"
sbn.set(rc={'figure.figsize':(14,6)})
plt.style.use('seaborn-v0_8')
sbn.set_palette("husl")
# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.float_format','{:.2f}'.format)
# Warnings
import warnings
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')
In [4]:
print("="*60)
print("Rossman Store Sales Time Series Analysis - Part 2")
print("="*60)
print("All libraries imported successfully!")
print("Analysis Date:", pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
============================================================ Rossman Store Sales Time Series Analysis - Part 2 ============================================================ All libraries imported successfully! Analysis Date: 2025-08-15 20:49:23
In [5]:
print("✅ Setup and Import Liraries completed.\n")
✅ Setup and Import Liraries completed.
In [6]:
# Start analysis
data_viz_begin = pd.Timestamp.now()
bold_start = '\033[1m'
bold_end = '\033[0m'
print("🔍 Part 2 Started ...")
print(f"🟢 Begin Date: {bold_start}{data_viz_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}\n")
🔍 Part 2 Started ...
🟢 Begin Date: 2025-08-15 20:49:23
Restore the file¶
In [7]:
%store -r df_viz_feat
View or Display Dataset¶
In [8]:
print("\nTrain Data Preview:")
print("\n",df_viz_feat.head())
Train Data Preview:
store dayofweek date sales customers open promo stateholiday schoolholiday day week month quarter year isweekend isholiday isschoolDay
982643 1115 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982640 1112 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982639 1111 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982638 1110 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
982637 1109 2 2013-01-01 0 0 0 No Promo Public 1 Tue 1 Jan 1 2013 False True False
2. Data Visualization¶
In [9]:
# Step 1: Setup & Imports Libraries
print("Step 2: Data Visualization started...")
time.sleep(1) # Simulate processing time
Step 2: Data Visualization started...
Box Plots by Time Segment¶
In [11]:
# Box plot by Month
fig1 = px.box(
df_viz_feat,
x='month',
y='sales',
title='Sales Distribution by Month',
category_orders={'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun','Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']}
)
fig1.update_layout(title_x=0.5, width = 1200,height=500)
fig1.show(config={'displayModeBar': True, 'displaylogo': False})
# Box plot by Day of Week
fig2 = px.box(
df_viz_feat,
x='day',
y='sales',
title='Sales Distribution by Day of Week',
category_orders={'day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']}
)
fig2.update_layout(title_x=0.5, width = 1200,height=500)
fig2.show(config={'displayModeBar': True, 'displaylogo': False})
# Box plot by Year
fig3 = px.box(
df_viz_feat,
x='year',
y='sales',
title='Sales Distribution by Year'
)
fig3.update_layout(title_x=0.5, width = 1200,height = 500)
fig3.show(config={'displayModeBar': True, 'displaylogo': False})
# Simple summary statistics
print("Sales Distribution Summary by Category:")
print("=" * 45)
print("\nBy Month:")
monthly_stats = df_viz_feat.groupby('month')['sales'].agg(['mean', 'median', 'std']).round(0)
for month, stats in monthly_stats.iterrows():
print(f"{month}: Mean=€{stats['mean']:,.0f}, Median=€{stats['median']:,.0f}")
print("\nBy Day:")
daily_stats = df_viz_feat.groupby('day')['sales'].agg(['mean', 'median', 'std']).round(0)
for day, stats in daily_stats.iterrows():
print(f"{day}: Mean=€{stats['mean']:,.0f}, Median=€{stats['median']:,.0f}")